Method For Separating Mixed Signals Into A Plurality Of Component Signals

ABSTRACT

The invention relates to a method for breaking down a set of signals consisting of an instantaneous linear mixture of independent components. The method comprises a step ( 200 ) consisting in estimating a mixture matrix A by using the product of one or more trigonometric rotation matrices by one or more hyperbolic rotation matrices.

The present invention relates to the general technical field of signal processing, and more specifically of separating source signals from mixtures of source signals by using blind separation of sources.

The present invention may find many applications. It may notably be applied to processing electroencephalograms, to signal processing in the field of radio communications, or further to the processing of mass spectra.

GENERAL PRESENTATION OF THE PRIOR ART

The problem of separating sources is a relatively recent problem in signal processing, which consists of separating statistically independent sources, from several mixtures recorded either by a network of sensors at a given instant (space dispersion).

It frequently happens that an observed physical phenomenon is the superposition of several independent elementary phenomena and that this superposition is expressed by an instantaneous linear combination (or mixture).

The measurement data from the observation of this phenomenon are then an instantaneous linear mixture of the contributions from each independent elementary phenomenon.

By placing a plurality of sensors, it is then possible to measure several different instantaneous linear mixtures of these contributions. For example, in electroencephalography, various electrodes are placed on the scalp of a patient. These electrodes enable the cerebral activity of the brain of the patient to be measured.

Each electrode measures a mixture of electric signals emitted by various sources of the brain, the different electrodes measuring different mixtures. Mixtures of sources are said to be different when the sources appear therein in different proportions or with different weights.

In this case, no other information is known:

-   -   on the sources and     -   on how the signals from these sources mix.

Source separation is then used for identifying the weights of the mixture in a first phase, and for separating in the measurements the contribution of each source in a second phase. This allows separate analysis of the different contributions without being disturbed by the mixture.

Presently there are two types of methods for source separation.

The first type of methods uses a preliminary step, a so-called <<whitening>> step, with which the subsequent computational steps may be facilitated in order to identify the weights of the mixture. However, this whitening step induces an error which cannot be corrected upon the separation of the contribution of each source. This type of methods is therefore not very accurate.

The second type of methods does not require any whitening step; however this second type of method induces problems in terms of efficiency and computational cost.

An object of the present invention is to propose a method for separating mixed signals into a plurality of component signals with which at least one of the drawbacks of the aforementioned types of methods may be overcome.

PRESENTATION OF THE INVENTION

For this purpose, a method is provided for separating mixed signals (x₁(t), . . . , x_(N)(t)) recorded by at least two sensors into a plurality of component signals (s₁(t), . . . , s_(M)(t)), the mixed signals (x₁(t), . . . , x_(N)(t)) corresponding to N linear combinations of M component signals (s₁(t), . . . , s_(M)(t)) stemming from independent sources, each mixed signal being able to be written as the product of a mixture matrix (A) multiplied by a vector of the component signals from the sources (s₁(t), . . . , s_(M)(t)), the method further comprising a joint diagonalization step in which the mixture matrix A is estimated from a product of rotation matrices corresponding to one or more Givens rotation matrices (G(θ)) and to one or more “hyperbolic rotation” matrices (H(φ)).

Within the scope of the present invention, by “hyperbolic rotation matrix”, is meant a rotation matrix including terms which are hyperbolic functions (for example, cos h, sin h) and having a determinant equal to 1.

Thus, with the invention, the mixture matrix may be identified by a single fast and reliable optimization step. The fact that there is only one single step provides a more accurate result than the two-step methods (whitening followed by joint diagonalization) such as in [1], [1b], [2]. Indeed, the unavoidable errors of the first step cannot be corrected by the second step. Further, the proposed method uses an algebraic algorithm (these are products of 2×2 size “trigonometric” and “hyperbolic” rotation matrices), which is particularly robust and not very costly in terms of computational power.

The methods comprising a whitening step build an orthogonal mixture matrix by the product of elementary orthogonal matrices (Givens rotations).

As for the present invention, it allows a mixture matrix to be built with a determinant equal to 1 by the multiplication of matrices with a determinant equal to 1. The group of orthogonal matrices (with a positive determinant +1) is included in the group of matrices with a determinant equal to 1. The steps for computing the mixture matrix are therefore facilitated with the method of the present invention which allows the initial constraints on the mixtures to be limited. The initial “whitening” step is no longer essential but the robustness and efficiency of “algebraic” methods are preserved.

Consequently, the method of the present invention enables the computation of a mixture matrix that is closer to the physical reality of the mixture, than the mixture matrix obtained with the method comprising a whitening step.

Preferred but non-limiting aspects of the method according to the invention are the following:

-   -   Givens rotation matrices include elements corresponding to         trigonometric functions of θ and hyperbolic rotation matrices         include elements corresponding to hyperbolic functions of φ;     -   the diagonalization step comprises a step for multiplying a         family of estimated matrices ({circumflex over (R)}_(k)) by the         product of a Givens rotation matrix and of a hyperbolic rotation         matrix on the right, and the transposed matrix of this product         on the left; and for computing the values of θ_(n) and of φ_(n)         in order to minimize the non-diagonal terms of the resulting         matrices of order n of this multiplication;     -   the multiplication step is repeated on the resulting matrices of         order n;     -   the multiplication step is repeated until the Givens angle θ and         the “hyperbolic angle” φ are such that cos θ et cos hφ tend to 1         respectively;     -   the estimated matrices {circumflex over (R)}_(k) are (to within         the estimation error) equal to the product of the mixture matrix         by a diagonal matrix by the transposed mixture matrix         (R_(k)=AD_(k)A^(T)), these are for example matrices of         cumulants, or covariance matrices in different frequency bands,         or intercorrelation matrices with different time shifts, or         covariance matrices estimated over different time intervals;     -   the Givens rotation matrices comprise cos θ elements on the         diagonal, a sin θ element below the diagonal and a −sin θ         element above the diagonal, and the “hyperbolic rotation”         matrices comprise cos hφ elements on the diagonal, sin hφ         element below the diagonal, and sin hφ element above the         diagonal;     -   the Givens rotation matrices are of the type

$\quad\begin{pmatrix} {\cos \; \theta} & {{- \sin}\; \theta} \\ {\sin \; \theta} & {\cos \; \theta} \end{pmatrix}$

and the “hyperbolic rotation” matrices are of the type

$\quad{\begin{pmatrix} {\cosh \; \phi} & {\sinh \; \phi} \\ {\sinh \; \phi} & {\cosh \; \phi} \end{pmatrix}.}$

The invention also relates to a device for separating mixed signals recorded by at least two sensors into a plurality of component signals, each mixed signal corresponding to an instantaneous linear combination of component signals stemming from independent sources, each mixed signal being able to be written as the product of a mixture matrix multiplied by a vector of component signals stemming from the sources, characterized in that the device further comprises means for joint diagonalization in which the mixture matrix is estimated from a product of rotation matrices corresponding to one or more Givens rotation matrices and to one or more hyperbolic rotation matrices.

The invention also relates to a computer program product comprising program code instructions recorded on a medium which may be used in a computer, characterized in that it comprises instructions for applying the method described above.

PRESENTATION OF THE FIGURES

Other characteristics, objects and advantages of the present invention will further be apparent from the description which follows, which is purely an illustration and not a limitation, and it should be read with reference to the appended figures in which:

FIG. 1 illustrates an embodiment of the method according to the invention,

FIG. 2 is a diagram illustrating the convergence toward diagonality as a function of a number of sweeps with 50 independent trials with sets of K=50 matrices of size N=50 for three different methods (J-Di, FFDiag and LUJ1D) for separating mixed signals;

FIG. 3 is a diagram illustrating the convergence of the separation criterion as a function of a number of sweeps with 50 independent trials with sets of K=50 matrices of size N=50 for three different methods (J-Di, FFDiag and LUJ1D) for separating mixed signals;

FIG. 4 is a diagram illustrating the convergence of the separation criterion as a function of a number of sweeps with 50 independent trials, K=N(N+1)/2=210 fourth order cumulant slices N=20 and σ=0.1;

FIG. 5 is a representation of an embodiment of the system for separating mixed signals according to the invention.

DESCRIPTION OF THE INVENTION

The following algebraic properties known from the skilled person are used hereafter:

-   -   Transposed of a product of 2 matrices A and B:

(AB)^(T) =B ^(T) A ^(T)

-   -   Inverse of the product of two matrices A and B:

(AB)⁻¹ =B ⁻¹ A ⁻¹

-   -   Givens rotation:

G(θ)⁻¹ =G(θ)^(T) =G(−θ)

-   -   Hyperbolic rotation:

H(φ)⁻¹ =H(−φ)

-   -   Symmetry of Hyperbolic rotation:

H(φ)=H(φ)^(T)

An embodiment of the method according to the invention will now be described in more detail with reference to FIG. 1.

Breaking down a mixture into independent components is of interest because it facilitates the interpretation, the analysis or the processing of each of the phenomena which concealed one another in the mixture.

These may be for example:

-   -   various (cerebral and muscular) electric phenomena contributing         to an electroencephalographic signal (EEG),     -   various radio transmitters in a narrow frequency band,     -   light emissions from different fluorochromes present in a         solution,     -   mass spectra of different mixtures of the same molecules.         By separating the different contributions, it is then possible:     -   to study the EEG signal rid of muscular parasites (blinks, ECG,         etc.),     -   to demodulate the signal from each radiofrequency source         separately,     -   to estimate the concentration of each fluorochrome (by cytometry         for example) or     -   to identify the different molecules present in the medium         studied through mass spectrometry.

The signal of a source depends on a variable quantity which may be time (EEG, radiofrequency), wavelength (fluorescence), mass (mass spectrometry), etc.

In the following, the method according to the invention will be presented while considering that the signals from the sources depend on time without this restricting the general nature of the method.

The problem of separating sources is mathematically modeled in the following way.

Each of the M sources emits a signal s_(m)(t) independent of the signals emitted by the other sources.

The mixture (at a given instant t) of these M contributions is observed by a network of N sensors (where N>M or N=M) which provide N distinct mixtures.

The data observed at an instant t, noted as x(t), are therefore multidimensional.

The vector of these N observations at a given instant may be written as the product of a matrix A (N lines, M columns), a so-called “mixture matrix”, by the vector s(t) of the source signals, i.e.:

$\begin{pmatrix} {x_{1}(t)} \\ {x_{2}(t)} \\ \vdots \\ \vdots \\ {x_{N}(t)} \end{pmatrix} = {{\begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1\; M} \\ a_{21} & a_{22} & \cdots & a_{2\; M} \\ \vdots & \ddots & \ddots & \vdots \\ \vdots & \; & \ddots & \vdots \\ a_{N\; 1} & a_{N\; 2} & \cdots & a_{NM} \end{pmatrix}\begin{pmatrix} {s_{1}(t)} \\ {s_{2}(t)} \\ \vdots \\ {s_{M}(t)} \end{pmatrix}{i.e.\mspace{14mu} {x(t)}}} = {{As}(t)}}$

The contributions s_(m)(t) of the different sources to the measurements x(t) may be shown by reformulating the previous equation:

$\begin{pmatrix} {x_{1}(t)} \\ {x_{2}(t)} \\ \vdots \\ \vdots \\ {x_{N}(t)} \end{pmatrix} = {{\sum\limits_{m = 1}^{m = M}\; {\begin{pmatrix} a_{1\; m} \\ a_{2\; m} \\ \vdots \\ \vdots \\ a_{Nm} \end{pmatrix}{s_{m}(t)}}} = {\sum\limits_{m = 1}^{m = M}{a_{m}{s_{m}(t)}}}}$

The method according to the invention allows the mixture matrix A to be estimated without any a priori information on this matrix (it is only supposed to be non-singular) and the source signals may then be estimated.

The source signals are supposed to be independent and some of their statistical properties are utilized such as their non-Gaussian character, their spectral differences or their non-stationary character for example.

In a first step 100, a family of K estimated matrices {circumflex over (R)}_(k) is calculated (for k comprised between 1 and K) from data recorded by the sensor network.

These estimated matrices {circumflex over (R)}_(k) are equal (to within the estimation error) to matrices R_(k) which have the following structure:

$R_{k} = {{{AD}_{k}A^{T}} = {\sum\limits_{m = 1}^{m = M}{{D_{k}\left( {m,m} \right)}a_{m}a_{m}^{T}}}}$

wherein D_(k) are diagonal matrices, and wherein A^(T) is the transposed mixture matrix A. Of course, one skilled in the art will understand that the matrices D_(k) may be diagonal blockwise.

The 3^(rd) term of the equation is only a reformulation of the 2^(nd) one which shows the R_(k) matrices as weighted sums of M matrices of rank 1 (the a_(m)a_(m) ^(T)). The term D_(k)(m,m)a_(m)a_(m) ^(T) is the contribution of the source m to the matrix R_(k).

Depending on these contexts, these estimated matrices {circumflex over (R)}_(k) may be matrices of cumulants of order 3 or 4 (JADE method [1], [1b]), covariance matrices in different frequency bands [3], intercorrelation matrices with different time shifts (SOBI method [2]) or covariance matrices estimated over different time intervals [3]. Of course, these estimated matrices may be of other types known to one skilled in the art.

In a second step 200, joint diagonalization of the estimated matrices is carried out as described in more detail in the following.

The method according to the invention allows the A matrix to be calculated from the {circumflex over (R)}₁, . . . , {circumflex over (R)}_(k), . . . {circumflex over (R)}_(K) matrices.

A matrix B is sought such that all the matrices B {circumflex over (R)}_(k)B^(T) are “as diagonal” as possible:

B{circumflex over (R)}_(k)B^(T)=D_(k)

B is built as a product of elementary N×N matrices of the following form:

${M\left( {i,j,\theta,\phi} \right)} = {\begin{matrix} i \\ j \end{matrix}\begin{pmatrix} I_{i - 1} & \; & \; & \; & \; \\ \; & \alpha & \; & \beta & \; \\ \; & \; & I_{j - i - 1} & \; & \; \\ \; & \gamma & \; & \delta & \; \\ \; & \; & \; & \; & I_{N - j} \end{pmatrix}}$

wherein i<j are any two indices between 1 and N, wherein I_(p) designates the identity matrix of size p×p and wherein

$\quad\begin{pmatrix} \alpha & \beta \\ \gamma & \delta \end{pmatrix}$

is the product of a Givens matrix of the

$\quad\begin{pmatrix} {\cos \; \theta} & {{- \sin}\; \theta} \\ {\sin \; \theta} & {\cos \; \theta} \end{pmatrix}$

type (wherein θ is an angle of the Givens rotation) by a hyperbolic rotation matrix of the

$\quad\begin{pmatrix} {\cosh \; \phi} & {\sinh \; \phi} \\ {\sinh \; \phi} & {\cosh \; \phi} \end{pmatrix}$

type (wherein φ is the hyperbolic rotation angle) so that:

$\begin{pmatrix} \alpha & \beta \\ \gamma & \delta \end{pmatrix} = {\begin{pmatrix} {\cos \; \theta} & {{- \sin}\; \theta} \\ {\sin \; \theta} & {\cos \; \theta} \end{pmatrix} \times \begin{pmatrix} {\cosh \; \phi} & {\sinh \; \phi} \\ {\sinh \; \phi} & {\cosh \; \phi} \end{pmatrix}}$

Thus, a difference with the diagonalization methods of the prior art is that the matrix B (and therefore the mixture matrix A) is estimated from a product of rotation matrices with one or more Givens rotation matrices and with one or more hyperbolic rotation matrices.

By “hyperbolic rotation matrix” is meant a matrix including elements corresponding to hyperbolic functions of φ.

More particularly, the family of estimated matrices {circumflex over (R)}_(k) is multiplied by:

-   -   the product of a Givens rotation matrix and a hyperbolic         rotation matrix on the right of each estimated matrix         {circumflex over (R)}_(k). It is obvious for one skilled in the         art that an equivalent implementation of the method would result         from an inversion of the rotations (hyperbolic and then Givens         rotation instead of Givens and then hyperbolic rotation).     -   and by the transposed matrix of this product on the left of each         estimated matrix {circumflex over (R)}_(k):

H₀^(T)(ϕ₀) × G₀^(T)(θ₀) × R̂₁ × G₀(θ₀) × H₀(ϕ₀) = D₁               … ${{H_{0}^{T}\left( \phi_{0} \right)} \times {G_{0}^{T}\left( \theta_{0} \right)} \times {\hat{R}}_{K} \times {G_{0}\left( \theta_{0} \right)} \times {H_{0}\left( \phi_{0} \right)}} = {{D_{K}{{with}\mspace{14mu} {G_{0}\left( \theta_{0} \right)}}} = {{\begin{pmatrix} {\cos \; \theta_{0}} & {{- \sin}\; \theta_{0}} & 0 & \cdots & 0 \\ {\sin \; \theta_{0}} & {\cos \; \theta_{0}} & 0 & \; & \vdots \\ 0 & 0 & 1 & \; & 0 \\ \vdots & \cdots & \; & \ddots & 0 \\ 0 & \cdots & 0 & 0 & 1 \end{pmatrix}{and}\mspace{14mu} {H_{0}\left( \phi_{0} \right)}} = \begin{pmatrix} {\cosh \; \phi_{0}} & {\sinh \; \phi_{0}} & 0 & \cdots & 0 \\ {\sinh \; \phi_{0}} & {\cosh \; \phi_{0}} & 0 & \; & \vdots \\ 0 & 0 & 1 & \; & 0 \\ \vdots & \cdots & \; & \ddots & 0 \\ 0 & \cdots & 0 & 0 & 1 \end{pmatrix}}}$

The angles θ₀ and φ₀ which minimize the non-diagonal terms of the matrices H₀ ^(T)(φ₀)×G₀ ^(T)(θ₀)×{circumflex over (R)}_(1 . . . K)×G₀(θ₀)×H₀(φ₀) resulting from this product are then sought.

This operation is repeated on the resulting matrices by shifting the elements of the Givens and hyperbolic rotation matrices to the following elements of the resulting matrices:

H₁^(T)(ϕ₁) × G₁^(T)(θ₁) × H₀^(T)(ϕ₀) × G₀^(T)(θ₀) × R̂₁ × G₀(θ₀) × H₀(ϕ₀) × G₁(θ₁) × H₁(ϕ₁) = D₁ ⋯ H₁^(T)(ϕ₁) × G₁^(T)(θ₁) × H₀^(T)(ϕ₀) × G₀^(T)(θ₀) × R̂_(K) × G₀(θ₀) × H₀(ϕ₀) × G₁(θ₁)H₁(ϕ₁) = D_(K) ${{with}\mspace{14mu} {G_{1}\left( \theta_{1} \right)}} = \begin{pmatrix} {\cos \; \theta_{1}} & 0 & {{- \sin}\; \theta_{1}} & 0 & \cdots & 0 \\ 0 & 1 & 0 & \; & \; & \; \\ {\sin \; \theta_{1}} & 0 & {\cos \; \theta_{1}} & 0 & \cdots & 0 \\ 0 & 0 & 0 & 1 & \; & 0 \\ \vdots & \; & \vdots & \; & \ddots & \; \\ 0 & \cdots & 0 & 0 & \; & 1 \end{pmatrix}$ ${{and}\mspace{14mu} {H_{1}\left( \phi_{1} \right)}} = \begin{pmatrix} {\cosh \; \phi_{1}} & 0 & {\sinh \; \phi_{1}} & 0 & \cdots & 0 \\ 0 & 1 & 0 & \; & \; & \; \\ {\sinh \; \phi_{1}} & 0 & {\cosh \; \phi_{1}} & 0 & \cdots & 0 \\ 0 & 0 & 0 & 1 & \; & 0 \\ \vdots & \; & \vdots & \; & \ddots & \; \\ 0 & \cdots & 0 & 0 & \; & 1 \end{pmatrix}$

and so forth for all the pairs of indices i and j:

H _(L) ^(T)(φ_(L))×G _(L) ^(T)(θ_(L))× . . . ×H ₀ ^(T)(φ₀)×G ₀ ^(T)(θ₀)×{circumflex over (R)}₁ ×G ₀(θ₀)×H ₀(φ₀)× . . . ×G _(L)(θ_(L))×H _(L)(φ_(L))= . . . H _(L) ^(T)(φ_(L))×G _(L) ^(T)(θ_(L))× . . . ×H ₀ ^(T)(φ₀)×G ₀ ^(T)(θ₀)×{circumflex over (R)}_(K) ×G ₀(θ₀)×H ₀(φ₀)× . . . ×G _(L)(θ_(L))×H _(L)(φ_(L))=D

These operations are repeated several times on all the pairs of indices i and j until all the Givens rotation angles θ and all the hyperbolic rotation angles φ are such that cos θ and cos hφ tend to 1, respectively.

Of course, one skilled in the art knows that other stopping criteria—implying global convergence of the method—are possible. For example, the global increase in the diagonal elements or the global reduction in the non-diagonal elements may be monitored in order to decide to stop the iterations.

The angles θ and φ which globally minimize the non-diagonal terms of the matrices M(i, j, θ, φ)^(T)×{circumflex over (R)}_(k)×M(i, j, θ, φ) may be obtained for example in the following way:

-   -   build a matrix C such that

${C = \begin{pmatrix} {\left( {{{\hat{R}}_{1}\left( {i,i} \right)} + {{\hat{R}}_{1}\left( {j,j} \right)}} \right)/2} & {\left( {{{\hat{R}}_{1}\left( {i,i} \right)} - {{\hat{R}}_{1}\left( {j,j} \right)}} \right)/2} & {{\hat{R}}_{1}\left( {i,j} \right)} \\ \vdots & \vdots & \vdots \\ {\left( {{{\hat{R}}_{k}\left( {i,i} \right)} + {{\hat{R}}_{k}\left( {j,j} \right)}} \right)/2} & {\left( {{{\hat{R}}_{k}\left( {i,i} \right)} - {{\hat{R}}_{k}\left( {j,j} \right)}} \right)/2} & {{\hat{R}}_{k}\left( {i,j} \right)} \\ \vdots & \vdots & \vdots \\ {\left( {{{\hat{R}}_{K}\left( {i,i} \right)} + {{\hat{R}}_{1}\left( {j,j} \right)}} \right)/2} & {\left( {{{\hat{R}}_{K}\left( {i,i} \right)} - {{\hat{R}}_{1}\left( {j,j} \right)}} \right)/2} & {{\hat{R}}_{K}\left( {i,j} \right)} \end{pmatrix}},$

-   -   calculate the generalized eigendecomposition of C^(T)×C and of         the diagonal matrix (−1,1,1), the eigenvector is noted as         v=(x,y,z)^(T) of the smallest positive generalized eigenvalue         such that z>0 and −x²+y²+z²=1,

calculate   ${{\cosh \; \phi} = \sqrt{\frac{1 + \sqrt{1 + x^{2}}}{2}}}\mspace{14mu}$ and   ${\sinh \; \phi} = {\frac{x}{\sqrt{2}\sqrt{1 + \sqrt{1 + x^{2}}}} = \frac{x}{2\cosh \; \phi}}$ calculate   ${{\cos \; \theta} = \sqrt{\frac{1 + \frac{z}{\sqrt{1 + x^{2}}}}{2}}}\mspace{14mu}$ and   ${\sin \; \theta} = {\frac{- y}{\sqrt{2\left( {1 + x^{2}} \right)\left( {1 + \frac{z}{\sqrt{1 + x^{2}}}} \right)}} = \frac{- y}{2\cos \; \theta \sqrt{1 + x^{2}}}}$

One skilled in the art will appreciate that other equivalent formulae exist since the question is only to solve the following equations:

$\quad\left\{ \begin{matrix} {2\cosh \; {\phi sinh}\; \phi} & = & x \\ {{- 2}\sin \; {\theta cos}\; {\theta \left( {{\cosh^{2}\phi} + {\sinh^{2}\phi}} \right)}} & = & y \\ {\left( {{2\cos^{2}\theta} - 1} \right)\left( {{\cosh^{2}\phi} + {\sinh^{2}\phi}} \right)} & = & z \end{matrix} \right.$

The formulae above have the advantage of having good numerical stability when x tends to 0 at the end of convergence of the method.

Other minor modifications may be useful for adapting to the finite accuracy of computer calculations. For example, the smallest positive eigenvalue may theoretically be in practice very slightly negative; nevertheless it should be retained. These points are obvious for one skilled in the art.

A more complete explanation of the calculation of the angles θ and φ from the matrix C will now be described.

The logic progression is the following.

For k=1, . . . , K, the product (noted as R′_(k)) of the resulting matrix is calculated R′_(k)=M (i, j,θ,φ)^(T)×{circumflex over (R)}_(k)×M(i,j,θ,φ):

R′ _(k) =H ^(T)(φ,i,j)×G ^(T)(θ,i,j)×{circumflex over (R)} _(k) ×G(θ,i,j)×H(φ,i,j)

Products of cosines of θ and sines of θ, as well as of hyperbolic cosines of φ and hyperbolic sines of φ, appear and are transformed into cosines and sines of the double angles (2θ, 2φ), which upon grouping the terms provides the following equations:

$\begin{pmatrix} {R_{1}^{\prime}\left( {i,j} \right)} \\ \vdots \\ {R_{k}^{\prime}\left( {i,j} \right)} \\ \vdots \\ {R_{K}^{\prime}\left( {i,j} \right)} \end{pmatrix} = {{C\begin{pmatrix} {\sinh \left( {2\phi} \right)} \\ {{- {\sin \left( {2\theta} \right)}}{\cosh \left( {2\phi} \right)}} \\ {{\cos \left( {2\theta} \right)}{\cosh \left( {2\phi} \right)}} \end{pmatrix}} = {{Cv}\left( {\theta,\phi} \right)}}$ Wherein ${C = \begin{pmatrix} {\left( {{{\hat{R}}_{1}\left( {i,i} \right)} + {{\hat{R}}_{1}\left( {j,j} \right)}} \right)/2} & {\left( {{{\hat{R}}_{1}\left( {i,i} \right)} - {{\hat{R}}_{1}\left( {j,j} \right)}} \right)/2} & {{\hat{R}}_{1}\left( {i,j} \right)} \\ \vdots & \vdots & \vdots \\ {\left( {{{\hat{R}}_{k}\left( {i,i} \right)} + {{\hat{R}}_{k}\left( {j,j} \right)}} \right)/2} & {\left( {{{\hat{R}}_{k}\left( {i,i} \right)} - {{\hat{R}}_{k}\left( {j,j} \right)}} \right)/2} & {{\hat{R}}_{k}\left( {i,j} \right)} \\ \vdots & \vdots & \vdots \\ {\left( {{{\hat{R}}_{K}\left( {i,i} \right)} + {{\hat{R}}_{K}\left( {j,j} \right)}} \right)/2} & {\left( {{{\hat{R}}_{K}\left( {i,i} \right)} - {{\hat{R}}_{1}\left( {j,j} \right)}} \right)/2} & {{\hat{R}}_{K}\left( {i,j} \right)} \end{pmatrix}},$

The sum of the squares of the non-diagonal elements (i, j) of the resulting matrix product M(i, j, θ, φ)^(T)×{circumflex over (R)}_(k)×M(i, j, θ, φ) is then equal to the quadratic form v(θ,φ)^(T)C^(T)C v(θ,φ).

It is noted that the vector v(θ,φ) describes a hyperboloid defined by v(θ,φ)^(T)J v(θ,φ)=1 wherein J=diag(−1,1,1),

The vector v(θ,φ) which maximizes the quadratic form v(θ,φ)^(T)C^(T)Cv(θ,φ) under the constraint v(θ,φ)^(T)J v(θ,φ)=1 is sought: it is well-known that this vector is necessarily one of the three generalized eigenvectors of the matrix pair (C^(T)C,J)

A study of this generalized eigendecomposition shows that this is the generalized eigenvector v=(x y z)^(T) corresponding to the smallest positive generalized eigenvalue of (C^(T)C,J);

${v\left( {\theta,\phi} \right)} = {\begin{pmatrix} {\sinh \left( {2\phi} \right)} \\ {{- {\sin \left( {2\theta} \right)}}{\cosh \left( {2\phi} \right)}} \\ {{\cos \left( {2\theta} \right)}{\cosh \left( {2\phi} \right)}} \end{pmatrix} = {\begin{pmatrix} x \\ y \\ z \end{pmatrix} = v}}$

Finally, in order to calculate the product M(i, j, θ, φ)^(T)×{circumflex over (R)}_(k)×M(i, j, θ, φ), the optimum values of θ and φ are unnecessary; only values of their (conventional and hyperbolic) cosines and sines are needed, which for example are given by:

${\cosh \; \phi} = {\frac{1}{\sqrt{2}}\sqrt{1 + \sqrt{1 + x^{2}}}}$ ${\sinh \; \phi} = \frac{x}{2\cosh \; \phi}$ ${\cos \; \theta} = {\frac{1}{\sqrt{2}}\sqrt{1 + \frac{z}{\sqrt{1 + x^{2}}}}}$ ${\sin \; \theta} = \frac{- y}{2\cos \; \theta \sqrt{1 + x^{2}}}$

The method therefore comprises the iteration of the products:

M(i,j,θ,φ)^(T)×{circumflex over (R)}_(k)×M(i,j,θ,φ)

for i<j between 1 and N and this several times until convergence.

The mixture matrix A is then equal to the inverse of H_(L) ^(T)(φ_(L))×G_(L) ^(T)(θ_(L))× . . . ×H₀ ^(T)(φ₀)×G₀ ^(T)(θ₀). Once the mixture matrix A is determined, it is possible to separate the source signals.

An exemplary implantation in Matlab language is given hereafter:

function [A,number_of_loops,matrices] = J_Di(matrices,threshold) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % %  J_Di.m written the 19th of december 2007 by Antoine Souloumiac. % %  This software computes the Joint Diagonalisation of a set of %  matrices. % %  input: %  matrices: array of matrices to be jointly diagonalized %  threshold: threshold for stop criterion on maximal sines % %  outputs: %  A : estimated mixture matrix %  number_of_loops: number of loops necessary for convergence %  matrices: array of matrices after joint diagonalization % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % initialization mat_size = size(matrices,1); nb_mat = size(matrices,3); A = eye(mat_size); go_on = 1; number_of_loops = −1; % joint dyadization loops while go_on  for n=2:mat_size   for m=1:n−1    % initialization    s_max = 0.0;    sh_max = 0.0;    % Givens and hyperbolic angles calculation    C = [ (squeeze(matrices(m,m,:)) + squeeze(matrices(n,n,:)))/2 ...      (squeeze(matrices(m,m,:)) − squeeze(matrices(n,n,:)))/2 ...       squeeze(matrices(m,n,:)) ];    [V,D] = eig(C′*C,diag([−1 1 1]));    % normalisation (−1 1 1) of generalized eigenvectors    V = V*diag(1./sqrt(abs(diag(V′*diag([−1 1 1])*V))));    % computation of the medium generalized eigenvalue    [sortD,index] = sort(diag(D));    ind_min = index(2);    v = V(:,ind_min)*sign(V(3,ind_min));    % computation of trigonometric and hyperbolic cosines and sines    ch = sqrt( (1+sqrt(1+v(1){circumflex over ( )}2))/2 );    sh = v(1)/( 2*ch );    c = sqrt( ( 1 + v(3)/sqrt(1+v(1){circumflex over ( )}2) )/2 );    s = −v(2)/( 2*c*sqrt( 1+v(1){circumflex over ( )}2 ) );    % maximal sine and sinh computation for stop test    s_max = max(s_max,abs(s));    sh_max = max(sh_max,abs(sh));    % product of Givens and hyperbolic rotations    rm11 = c*ch − s*sh;    rm12 = c*sh − s*ch;    rm21 = c*sh + s*ch;    rm22 = c*ch + s*sh;    % matrices Givens update    h_slice1 = squeeze(matrices(m,:,:));    h_slice2 = squeeze(matrices(n,:,:));    matrices(m,:,:) = rm11*h_slice1 + rm21*h_slice2;    matrices(n,:,:) = rm12*h_slice1 + rm22*h_slice2;    v_slice1 = squeeze(matrices(:,m,:));    v_slice2 = squeeze(matrices(:,n,:));    matrices(:,m,:) = rm11*v_slice1 + rm21*v_slice2;    matrices(:,n,:) = rm12*v_slice1 + rm22*v_slice2;    % dyads Givens update    col1 = A(:,m);    col2 = A(:,n);    A(:,m) = rm22*col1 − rm12*col2;    A(:,n) = −rm21*col1 + rm11*col2;   end  end  number_of_loops=number_of_loops+1;  % renormalization of the A columns (equal norm)  col_norm = sqrt(sum(A.{circumflex over ( )}2,1));  d = (prod(col_norm).{circumflex over ( )}(1/mat_size))./col_norm;  A = A.*repmat(d,mat_size,1);  matrices = matrices.*repmat((1./d)′*(1./d),[1 1 nb_mat]);  % stop test  if s_max<threshold & sh_max<threshold   go_on = 0;   disp([ ‘stop with max sinus and sinh =   ’ num2str([s_max sh_max]) ])  end  if number_of_loops>100    go_on = 0;    warning(‘maximum number of loops has been reached’)  end end

The calculation complexity has the approximate value of:

number of sweeps×2KN³

wherein:

-   -   the number of sweeps is the number of times the iterations are         repeated on the whole of the index pairs (i,j),     -   K is the number of {circumflex over (R)}_(k) matrices, and     -   N is the size of the {circumflex over (R)}_(k) matrices.

The present invention and the existing algorithms such as FFDiag [4], DNJD [6], LUJ1D [7], etc., have an equivalent computational cost by sweeping equal to 2KN³. The advantage of the present invention lies in the number of sweeps. More specifically, with the present method, the number of sweeps may be reduced as compared with known methods of the prior art.

As an example, a comparison of the method according to the present invention (designated hereafter as “J-Di”) and of the FFDiag (Ziehe [4]) and LUJ1D (Afsari [7]) methods will be presented in the following: this comparison illustrates the fact that the methods of the prior art have a number of sweeps at least twice as large as those of the method of the present invention.

Moreover, certain methods of the prior art are specialized on particular sets of matrices (Pham [3]: positive definite matrices, Vollgraf [5]: a set containing at least one positive definite matrix). The method according to the invention allows the processing of any types of set of matrices.

FIG. 5 illustrates an embodiment of the system for separating mixed signals according to the present invention. The system comprises processing means 10 (for joint diagonalization in which the mixture matrix A is estimated from a product of rotation matrices corresponding to one or more Givens rotation matrices and to one or more hyperbolic rotation matrices), such as a processor, display means 20, such as a display screen, and inputting means 30, such as a keyboard and a mouse. The system is connected to communication means so as to receive the mixed signals (x₁(t), . . . , x_(N)(t)) to be processed.

The reader will appreciate that many modifications may be brought to the invention as described above without materially departing from the teachings of the present document.

For example, the product of the Givens rotation (G(θ)) and hyperbolic rotation (H(φ)) matrices may correspond either to the multiplication of a Givens rotation matrix by a hyperbolic rotation matrix, or to the product of a hyperbolic rotation matrix by a Givens rotation matrix.

Consequently, all modifications of this type are intended to be incorporated within the scope of the appended claims.

Annex

With the present annex, the method described above may be better understood. Certain notations used earlier may possibly differ in the following. In any case, the changes in notation will be specified in order to facilitate understanding by the reader.

I. ABSTRACT

A new algorithm for computing the non-orthogonal joint diagonalization of a set of matrices is proposed for independent component analysis and blind source separation applications.

This algorithm is an extension of the Jacobi-like algorithm first proposed in the Joint Approximate Diagonalization of Eigenmatrices (JADE) method for orthogonal joint diagonalization. The improvement consists mainly in computing a mixing matrix of determinant one and columns of equal norm instead of an orthogonal mixing matrix. This target matrix is constructed iteratively by successive multiplications of not only Givens rotations but also hyperbolic rotations and diagonal matrices. The algorithm performance, evaluated on synthetic data, compares favorably with existing methods in terms of speed of convergence and complexity.

II. INTRODUCTION

Joint diagonalization algorithms are used for blind source separation (BSS) and independent component analysis (ICA) applications.

In these applications, one aims at separating independent signals (the sources) that were mixed instantaneously, i.e. without temporal convolution, by a matrix A observing only the mixtures and without any other prior information than the statistical independence of the sources.

Many BSS methods like [1], [1b], [2], [3], [4], [5], [11] contain a joint diagonalization step of a set R of K symmetric matrices R₁, R_(k), . . . R_(K) of size N×N. According to the application at hand and the corresponding expected properties of the source signals, the considered matrices R_(k) are for instance covariance matrices estimated on different time windows, third or fourth ([1], [1b]) or higher order cumulants slices, or inter-correlation matrices with time shifts [2], etc. All these matrices share the common structure

R_(k)=AD_(k)A^(T)  (1)

where A is the mixing matrix and the D_(k) are diagonal matrices. Under these conditions, the separation matrix defined by B

A⁻¹ is characterized by the property of diagonalizing every R_(k) i.e. BR_(k)B^(T)=D_(k) for 1≦k≦K. Under weak assumptions and up to the permutation and scaling indeterminations, this property is sufficient for estimating the separation matrix, or equivalently the mixing matrix, by a joint diagonalization procedure of the R_(k) matrices. In practice, the R_(k) matrices are corrupted by some estimation error and A (or B) is efficiently estimated via an approximate joint diagonalization of the set R.

Besides, the first joint diagonalization techniques, like JADE [1], [1b] or SOBI [2] for instance, require the positive definiteness of one matrix in R, generally the covariance matrix of the mixed sources (without noise). In this context, it is then possible to pre-whiten the mixed sources, which reduces the joint diagonalization problem to computing an orthogonal (or unitary in the complex case) mixing matrix. Unfortunately, this pre-whitening step is never exact due to the estimation error in the covariance matrix of the mixed sources. This preliminary error cannot be corrected by the Orthogonal Joint Diagonalization (OJD) that follows. As a consequence, many authors developped Non-Orthogonal Joint Diagonalization (NOJD) methods to avoid pre-whitening. These optimization methods address sets of positive definite matrices [3], or sets containing one positive definite matrix like QDiag [12], [5] or, eventually, any set of matrices [11], [4], [7]).

A new non-orthogonal joint diagonalization algorithm is proposed for any set of matrices for application to BSS and ICA based on an algebraic approach that enjoys the good convergence properties of Jacobi techniques like JADE. To obtain this result, the mixing matrix A is also constructed by multiplication of elementary matrices. But as A is not orthogonal, it is necessary to define a new multiplicative group that contains the group of orthogonal matrices as a sub-group, and a new set of elementary matrices that generalizes the Givens rotations. This can be achieved by iteratively building a mixing matrix of determinant equal to one and whose columns are of equal norm by successive multiplications of Givens rotations, hyperbolic rotations and diagonal matrices.

Hereafter in the third section, the constraints of determinant and column norm of the mixing matrix and their consequences on NOJD problem indeterminations are investigated.

In the fourth section, it is shown how a matrix of determinant one can be generated as a product of Givens rotations, hyperbolic rotations and diagonal matrices.

The new non-orthogonal approximate Joint Diagonalization algorithm, called J-Di for Joint-Diagonalization, is described in the fifth section and its performance is evaluated on synthetic data in section six. In particular simulations are provided on large sets of large matrices, for instance 200 matrices of size 200×200. This is interesting for applications as electro-encephallograms processing where the number of electrods reaches 128 or 256.

Finally, a short complexity comparison with the recent algorithm proposed by Wang, Liu and Zhang in [6] is added.

III. MATRICES OF DETERMINANT EQUAL TO ONE AND COLUMNS OF EQUAL NORM IIIA. Properties

It is assumed that any regular square mixing matrix A in model (1) has determinant equal to one and columns of equal norm. As a matter of fact, it is always possible to insert in (1) a regular diagonal matrix D_(A)

R _(k) =AD _(k) A ^(T)=(AD _(A))(D _(A) ⁻¹ D _(k) D _(A) ⁻¹)(AD _(A))^(T)  (2)

The matrices D_(A) ⁻¹D_(k)D_(A) ⁻¹ are obviously diagonal and D_(A) can be tuned such that AD_(A) verifies the required determinant and column norm properties. It is sufficient to consider

$\begin{matrix} {{D_{A} = {\rho \begin{pmatrix} \frac{{sign}\left( {\det (A)} \right)}{a_{1}} & 0 & \cdots & 0 \\ 0 & \frac{1}{a_{2}} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \frac{1}{a_{N}} \end{pmatrix}}}{with}} & (3) \\ {\rho = \sqrt[N]{\frac{{a_{1}}{a_{2}}\cdots {a_{N}}}{{\det (A)}}}} & (4) \end{matrix}$

where a_(n) is the n-th column of A and ∥a_(n)∥ its Euclidean norm. These constraints on the mixing matrix provide a kind of normalization that is useful for numerical accuracy. In particular, the condition number of AD_(A) is often smaller than the condition number of A but this is not a general rule.

III.B. Uniqueness Conditions and Indeterminations of the Mixing Matrix

As described in [9], [10], there is a unique solution to the NOJD problem if and only if there are no colinear columns in the matrix D resulting from the vertical concatenation of the D_(k) diagonals (D is K×N and its (k, n) entry is the n-th diagonal entry of D_(k)). If ρ_(nm) is the absolute value of the cosine of the D n-th and m-th columns angle and ρ

maxt_(n,m)ρ_(nm), then the uniqueness condition can be simply be reformulated as: ρ<1. This parameter ρ, called the modulus of uniqueness, is also an indicator of the difficulty of an NOJD problem with the obvious rule: the closer ρ is to 1, the more difficult is the NOJD.

It will be understood that a “unique solution” of the NOJD problem is only “unique” up to the classical permutation and scale indeterminations of the columns of the mixing matrix. The unity determinant and equal column norm properties (of the present invention) slightly change these indeterminations. As a matter of fact, it is easy to show that the permutation indetermination remains but that the scale indetermination reduces to a sign indetermination. More precisely, an even number of the mixing matrix columns can be multiplied by −1.

IV. A NEW CLASS OF ELEMENTARY MATRICES TO SPAN THE SPECIAL LINEAR GROUP

The goal of this section is to propose sets of elementary matrices which generate the set A(R,N) and the group SL(R,N) by repeated multiplications.

IV.A. Unique Decomposition in A(R, 2)

Consider a matrix A in A(R, 2) (det(A)=1 and ∥a₁∥) and its polar decomposition A

GH where G is orthogonal and H is symmetric positive. As the determinants of A and H are both positive, the determinant of G is equal to plus one and G is necessarily a Givens rotation G(θ) with

$\begin{matrix} {{G(\theta)}\overset{\Delta}{=}\begin{pmatrix} {\cos \; \theta} & {{- \sin}\; \theta} \\ {\sin \; \theta} & {\cos \; \theta} \end{pmatrix}} & (5) \end{matrix}$

It will be shown now that H, whose determinant is also equal to plus one, is actually a hyperbolic rotation.

Denote e, f and g the entries of H

$\begin{matrix} {H\overset{\Delta}{=}\begin{pmatrix} e & f \\ f & g \end{pmatrix}} & (6) \end{matrix}$

H=G(−θ)A has columns of equal norm like A. Therefore e²+f²=f²+g², H is positive by construction, hence e>0 and g>0 and e=g. The determinant of H is equal to 1, then e²−f²=1 and there is a unique angle φ such that e=cos h(θ) and f=sinh(φ). Finally H=H(φ) with

$\begin{matrix} {{H(\phi)}\overset{\Delta}{=}\begin{pmatrix} {\cosh \; \phi} & {\sinh \; \phi} \\ {\sinh \; \phi} & {\cosh \; \phi} \end{pmatrix}} & (7) \end{matrix}$

In conclusion, any matrix of A(R, 2) can be decomposed in a unique product of a Givens and a hyperbolic rotation.

IV.B. Multiple Solutions in SL(R, 2)

Any matrix F in SL(R, 2) can be uniquely decomposed in the product of a matrix A in A(R, 2) multiplied by a 2-by-2 diagonal matrix D(λ) with

$\begin{matrix} {{D(\lambda)}\overset{\Delta}{=}\begin{pmatrix} \lambda & 0 \\ 0 & {1/\lambda} \end{pmatrix}} & (8) \end{matrix}$

and λ=√{square root over (∥f₁∥/∥f₂∥)} where f₁, f₂ are the columns of F and ∥f₁∥, ∥f₂∥ their norms. The matrix A can be decomposed in a product of a Givens and a hyperbolic rotation as shown in the previous section.

Finally, there are two unique angles: θ between −π and +π and φ in R, and a unique positive real number λ such that

F=G(θ)H(φ)D(λ)  (9)

Further decompositions can be derived from the singular value decomposition (SVD) of F

USV^(T).

Necessarily det(S)=1 and det(U)=det(V)=±1. But note that it is always possible to change the sign of the first column (or line) of U and V so that det(U)=det(V))=+1. This proves that both U and V can be chosen as Givens rotations: U=G(θ₁) and V=G(θ₂). Obviously, S=D(s₁) where s₁>0 is the first S singular value. The following expression is finally obtained

F=G(θ₁)D(s ₁)G(θ₂)^(T)  (10)

In addition, the eigenvalue decomposition (EVD) of any hyperbolic rotation H(φ) gives

H(φ)=G(π/4)D(e ^(φ))G(π/4)^(T)  (11)

and consequently

F=G(θ₁−π/4)H(log s ₁)G(θ₂−π/4)^(T)  (12)

These decompositions prove that any matrix of SL(R, 2) can be generated by multiplying Givens rotations and diagonal matrices, or Givens rotations and hyperbolic rotations. In addition, the diagonal matrices can be used to normalize the columns of any matrix in SL(R, 2) to get a matrix of A(R, 2). These results can be extended to dimension N.

IV.C. The Extension to SL(R,N)

The following notation is firstly introduced.

For any i<jε{1, . . . , N}G(θ, i, j) denotes the classical Givens rotation

$\begin{matrix} {{G\left( {\theta,i,j} \right)}\overset{\Delta}{=}\begin{pmatrix} I_{i - 1} & \vdots & 0 & \vdots & 0 \\ \cdots & {\cos \; \theta} & \cdots & {{- \sin}\; \theta} & \cdots \\ 0 & \vdots & I_{j - i - 1} & \vdots & 0 \\ \cdots & {\sin \; \theta} & \cdots & {\cos \; \theta} & \cdots \\ 0 & \vdots & 0 & \vdots & I_{N - j} \end{pmatrix}} & (13) \end{matrix}$

H(φ, i, j) is the corresponding hyperbolic rotation

$\begin{matrix} {{H\left( {\phi,i,j} \right)}\overset{\Delta}{=}\begin{pmatrix} I_{i - 1} & \vdots & 0 & \vdots & 0 \\ \cdots & {\cosh \; \phi} & \cdots & {\sinh \; \phi} & \cdots \\ 0 & \vdots & I_{j - i - 1} & \vdots & 0 \\ \cdots & {\sinh \; \phi} & \cdots & {\cosh \; \phi} & \cdots \\ 0 & \vdots & 0 & \vdots & I_{N - j} \end{pmatrix}} & (14) \end{matrix}$

and D(λ, i, j) denotes the following diagonal matrix

$\begin{matrix} {{D\left( {\lambda,i,j} \right)}\overset{\Delta}{=}\begin{pmatrix} I_{i - 1} & \vdots & 0 & \vdots & 0 \\ \cdots & \lambda & \cdots & 0 & \cdots \\ 0 & \vdots & I_{j - i - 1} & \vdots & 0 \\ \cdots & 0 & \cdots & {1/\lambda} & \cdots \\ 0 & \vdots & 0 & \vdots & I_{N - j} \end{pmatrix}} & (15) \end{matrix}$

Very similarly to the N=2 case, we use the SVD F

USV^(T) of a matrix F in SL(R,N) is used. The determinant of the orthogonal matrices U and V can be chosen equal to plus one by changing the sign of the first column of U and V. It is well known that every orthogonal matrix of determinant one (in SO(R,N)) can be decomposed in the product of Givens rotations. By construction S is diagonal, positive definite and its determinant is equal to plus one (the product of the singular values s₁, s₂, . . . , s_(N) is equal to one). One can also show that

S=D(s ₁,1,2)D(s ₁ s ₂,2,3) . . . D(s ₁ s ₂ . . . s _(N-1) ,N−1,N)  (16)

In addition, the D(λ, i, j) matrices can be transformed in the product of Givens and hyperbolic rotations like in the two-dimensional case. Finally the same conclusions hold in SL(R,N) as in SL(R, 2): every matrix in SL(R,N) can be generated by multiplying Givens rotations and diagonal matrices, or Givens and hyperbolic rotations. The diagonal matrices can be used to normalize the columns of any matrix in SL(R,N) to get a matrix in A(R,N).

It will be seen in the next section that the parameters of the three elementary matrices described above can be easily optimized in the NOJD context.

V. THE J-DI ALGORITHM

In this section, it is described how to construct the unknown matrix A of determinant equal to one and columns of equal norm as a product of the elementary matrices G(θ, i, j), H(φ, i, j) and D(λ, i, j) by jointly diagonalizing the set R={circumflex over (R)}₁, . . . , {circumflex over (R)}_(k), . . . {circumflex over (R)}_(K) corresponding to estimations of the exact matrices of R₁, . . . R_(k), R_(K).

More precisely, matrices G(θ, i, j) and H(φ, i, j) are used to minimize off-diagonal entries, while D(λ, i, j) matrices are used to normalize the columns.

V.A. Optimization of the Parameters θ and φ to Minimize the (i, j) Off-Diagonal Entries

For k=1, . . . , K denote by R′_(k) the matrix obtained by updating {circumflex over (R)}_(k) in the following way

R′ _(k) =H ^(T)(φ,i,j)×G ^(T)(θ,i,j)×{circumflex over (R)} _(k) ×G(θ,i,j)×H(φ,i,j)  (17)

It is proposed to minimize the sum of the squares of the (i, j)-entries, denoted R′_(k)[i, j], of the K matrices R′_(k) i.e.

$\begin{matrix} {\left( {\theta_{opt},\phi_{opt}} \right)\overset{\Delta}{=}{\arg {\min\limits_{\theta,\phi}{\sum\limits_{k = 1}^{k = K}{R_{k}^{\prime}\left\lbrack {i,j} \right\rbrack}^{2}}}}} & (18) \end{matrix}$

Minimizing this quantity is not equivalent to minimizing all the off-diagonal entries or a global cost function like in JADE. Nevertheless, the performance evaluation below will show that this iterated local minimization actually jointly diagonalizes the matrix set R. After some manipulations, one can see that

$\begin{matrix} {{\begin{pmatrix} {R_{1}^{\prime}\left( {i,j} \right)} \\ \vdots \\ {R_{k}^{\prime}\left( {i,j} \right)} \\ \vdots \\ {R_{K}^{\prime}\left( {i,j} \right)} \end{pmatrix} = {{C\begin{pmatrix} {\sinh \left( {2\phi} \right)} \\ {{- {\sin \left( {2\theta} \right)}}{\cosh \left( {2\phi} \right)}} \\ {{\cos \left( {2\theta} \right)}{\cosh \left( {2\phi} \right)}} \end{pmatrix}} = {{Cv}\left( {\theta,\phi} \right)}}}{Where}} & (19) \\ {{C = \begin{pmatrix} {\left( {{{\hat{R}}_{1}\left( {i,i} \right)} + {{\hat{R}}_{1}\left( {j,j} \right)}} \right)/2} & {\left( {{{\hat{R}}_{1}\left( {i,i} \right)} - {{\hat{R}}_{1}\left( {j,j} \right)}} \right)/2} & {{\hat{R}}_{1}\left( {i,j} \right)} \\ \vdots & \vdots & \vdots \\ {\left( {{{\hat{R}}_{k}\left( {i,i} \right)} + {{\hat{R}}_{k}\left( {j,j} \right)}} \right)/2} & {\left( {{{\hat{R}}_{k}\left( {i,i} \right)} - {{\hat{R}}_{k}\left( {j,j} \right)}} \right)/2} & {{\hat{R}}_{k}\left( {i,j} \right)} \\ \vdots & \vdots & \vdots \\ {\left( {{{\hat{R}}_{K}\left( {i,i} \right)} + {{\hat{R}}_{K}\left( {j,j} \right)}} \right)/2} & {\left( {{{\hat{R}}_{K}\left( {i,i} \right)} - {{\hat{R}}_{K}\left( {j,j} \right)}} \right)/2} & {{\hat{R}}_{K}\left( {i,j} \right)} \end{pmatrix}},} & (20) \end{matrix}$

Note that the squaring of the coefficients of G and H in R′_(k) is transformed in double (trigonometric and hyperbolic) angles. The sum of squares of the R′k [i, j] is equal to v(θ,φ)^(T)C^(T)C v(θ,φ). Hence the minimization in (18) becomes

$\begin{matrix} {\left( {\theta_{opt},\phi_{opt}} \right)\overset{\Delta}{=}{\arg {\min\limits_{\theta,\phi}{{v\left( {\theta,\phi} \right)}^{T}C^{T}{{Cv}\left( {\theta,\phi} \right)}}}}} & (21) \end{matrix}$

where the vector v(θ,φ) is normalized not in the Euclidean norm but in a hyperbolic norm. As a matter of fact, we have −v₁(θ, φ²+v₂(θ,φ)²+v₃(θ, φ)²=1, i.e. v(θ,φ^(T)J v(θ,φ)=1 with J=diag(−1,1,1).

One can prove that the whole hyperboloid defined by v(θ,φ)^(T)J v(θ,φ)=1 is spanned by v(θ,φ) with −π/2≦θ≦π/2, and φ in R. In other words, the minimization of the quadratic quantity v^(T)C^(T)Cv under the quadratic constraint v^(T) Jv=1 is achieved. The solution v is known to be a generalized eigenvector of (C^(T)C, J), i.e. v is such that C^(T)Cv=δ_(v)Jv where δ_(v) is the generalized eigenvalue of v.

This generalized eigen decomposition is studied hereafter. Denote V the 3×3 matrix whose columns are the generalized eigenvectors and Δ the diagonal matrix of the generalized eigenvalues of (C^(T)C, J), then C^(T)CV=JVΔ and therefore V^(T)C^(T)CV=V^(T)JVΔ. It is known that V^(T)JV is diagonal and has one negative and two positive diagonal entries (Sylvester law of inertia) like J. As C^(T)C is positive, Δ has the same inertia as J and there are:

-   -   one generalized eigenvector with a negative generalized         eigenvalue and a negative J-norm v^(T)Jv=−1, and     -   two generalized eigenvectors with positive generalized         eigenvalues and positive J-norm v^(T)Jv=+1.

The generalized eigenvector we are looking for is such that v^(T)Jv is equal to +1 and that v^(T)C^(T)Cv is minimal, which means that it corresponds to the smallest positive generalized eigenvalue of (C^(T)C, J).

In summary, the optimal Givens and hyperbolic angles θ and φ are defined by the equation

${v\left( {\theta,\phi} \right)} = {\begin{pmatrix} {\sinh \left( {2\phi} \right)} \\ {{- {\sin \left( {2\theta} \right)}}{\cosh \left( {2\phi} \right)}} \\ {{\cos \left( {2\theta} \right)}{\cosh \left( {2\phi} \right)}} \end{pmatrix} = {\begin{pmatrix} x \\ y \\ z \end{pmatrix} = v}}$

where v=(x,y,z)^(T) is the generalized eigenvector of (C^(T)C, J) of the smallest positive eigenvalue.

Since an eigenvector is determined up to its sign, it is always possible to choose v such that z>0 i.e. such that cos 2θ is positive. Besides, the expression (17) is invariant when θ is replaced with θ+π.

Hence θ can be chosen between −π/2 and −π/2 such that cos(θ)>0. Thus, it can assume that both cos θ and cos 2θ are positive, which is equivalent to assuming −π/4≦θ≦π/4.

Note that the explicit expression of the angles θ and φ is not necessary. Only the computation of cos h φ, sin h φ, cos θ and sin θ is necessary to implement the update (17). Simple trigonometry gives immediately the following solutions of (22)

$\begin{matrix} {{\cosh \; \phi} = {\frac{1}{\sqrt{2}}\sqrt{1 + \sqrt{1 + x^{2}}}}} & (23) \\ {{\sinh \; \phi} = \frac{x}{2\cosh \; \phi}} & (24) \\ {{\cos \; \theta} = {\frac{1}{\sqrt{2}}\sqrt{1 + \frac{z}{\sqrt{1 + x^{2}}}}}} & (25) \\ {{\sin \; \theta} = \frac{- y}{2\cos \; \theta \sqrt{1 + x^{2}}}} & (26) \end{matrix}$

At this stage, a method for globally minimizing the R′_(k) [i, j] for a given couple of indices 1≦i≦j≦N with a Givens rotation G(θ, i, j) and a hyperbolic rotation H(φ, i, j) is known. Once these optimal angles θ and φ are computed, every matrix of the set R is updated by (17) which can be achieved by computing only two linear combinations of the lines or the columns i and j of R_(k).

The mixing matrix, initialized at the N×N identity, is updated by

$\begin{matrix} {A^{\prime} = {{{AG}\left( {\theta,i,j} \right)}^{- T}{H\left( {\phi,i,j} \right)}^{- T}}} & (27) \\ {\mspace{25mu} {= {{{AG}\left( {\theta,i,j} \right)}{H\left( {{- \phi},i,j} \right)}}}} & (28) \end{matrix}$

in order to conserve AR_(k)A^(T)=A′R′_(k)A′^(T). After an update, the norm of the columns of A are no longer equal and can be re-normalized with matrices D(i, j, λ). It is explained in the next section that it is not necessary to carry out this normalization at each step. The R_(k) and A updates are repeated for each pair of indices (l, j) with 1≦i<j≦N like in the Jacobi algorithm [8]. A complete updating over the N(N−1)/2 pairs of indices is called a sweep. Similarly to the classical Jacobi algorithms, several sweeps are generally necessary until complete convergence.

V.B. Details of Implementation

Other sequences of rotations could be imagined. As the Givens rotations are orthogonal, they generate less roundoff noise than the hyperbolic rotations. So, one could for instance begin the joint diagonalization by several sweeps of pure Givens rotations updates (i.e. JADE sweeps) before using the non-orthogonal hyperbolic rotations. This precaution was not necessary in our numerical experiments, even for badly conditioned mixing matrices (condition number up to 100000). The loss of numerical accuracy is probably linked to the hyperbolic rotations with a large angle φ: these events could also be monitored.

The JADE or SOBI pre-whitening is not necessary but can still be used for improving numerical accuracy if R contains a positive definite matrix. In BSS applications, the reduction of the data to the subspace of few principal components can obviously be computed if the prior knowledge that the number of sources is smaller than the number of sensors is known.

The expressions (23), (24), (25), (26) are not the only possible inversion formulae of (22). They have been chosen for their good numerical accuracy when v1 and v2 converge to 0. A better choice possibly exists. When close to convergence, the matrices C and C^(T)C are close to singularity. Then the generalized eigen decomposition of (C^(T)C, J) can be numerically difficult and sometimes the target generalized eigenvector no longer corresponds to the smaller positive generalized eigenvalue but to a very small negative eigenvalue. To avoid selecting the erroneous larger positive eigenvalue, it is preferable to choose the eigenvector associated with the median generalized eigenvalue instead of the smaller positive one.

It is implemented in the following simulations. Other solutions like discarding the update if the eigenvalue signs are not correct (+,+,−) or developping a specific procedure to compute only the target eigenvector could be investigated.

It is observed in the simulations that the normalization of the columns of A is not necessary at each step but may be performed only once per sweep. It is simply achieved by multiplying A at the right by a diagonal matrix D_(A) and the R_(k) by D⁻¹ _(A) on both sides (similarly to equations (2) and (3)) with

$\begin{matrix} {D_{A} = {\sqrt[N]{{a_{1}}{a_{2}}\cdots {a_{N}}}\begin{pmatrix} \frac{1}{a_{1}} & 0 & \cdots & 0 \\ 0 & \frac{1}{a_{2}} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \frac{1}{a_{N}} \end{pmatrix}}} & (29) \end{matrix}$

where a_(n) is the n-th column of A and ∥a_(n)∥ its Euclidean norm.

The cost of this normalization (2KN²) at each sweep is negligible compared to the update (17) of R_(k)(2KN³). In view of the simulations, the present invention still converges without any normalization of the columns' norm.

Several stop criteria can be imagined like monitoring the norm of off-diagonal terms and detecting its stabilization at a minimal value. For instance in the present case, it is chosen to stop the iterations when all the trigonometric and hyperbolic sines of a sweep are very small (a threshold between 10⁻²⁰ and 10⁻²⁵ seems relevant for exactly diagonalizable matrices and computations in Matlab double precision; this value has to be increased between 10⁻¹⁰ and 10⁻²⁰ for non-exactly diagonalizable sets).

V.C. Summary of the J-Di Algorithm

The numerical simulations of the next section implement the following version of J-Di.

Require: K symmetric N×N matrices R1, . . . RK and a threshold τ

A ← I_(N) while max_(i,j)(|sinhφ|,|sinθ|) > τ do  for all i,j such that 1 ≦ i < j ≦ N do   build C as in (20)   compute the eigenvector v

 (v₁,v₂,v₃)^(T) of   (C^(T)C,diag(−1,1,1)) of median eigenvalue   if v₃ < 0 then    v ← −v   end if   compute coshφ and sinhφ as in (23) and (24)   compute cosθ and sinθ as in (25) and (26)   for k = 1 to K do    R_(k) ← H(φ,i,j)G(−θ,i,j)R_(k)G(θ,i,j)H(φ,i,j)    A ← AG(θ,i,j)H(−φ,i,j)   end for  end for  compute D_(A) as in (29)  A ← AD_(A)  for k = 1 to K do   R_(k) ← D_(A) ⁻¹R_(k)D_(A) ⁻¹  end for end while

The computational cost of the algorithm is mainly the cost of updating the R_(k) matrices for each pair of indices i and j, i.e. the cost of (17). By using the sparsity of G and H and the symmetry of the R_(k), the cost of this update can be reduced to 4KN flops. For one sweep over the N(N−1)/2 pairs of indices, the cost is then roughly 2KN³. The number of sweeps to be made until convergence depends on the size and the difficulty of the NOJD problem at hand as described in the next section.

V.D. Short Comparison with Afsari Algorithms

Several similarities between J-Di and the four algorithms proposed in [7] deserve to be emphasized.

-   -   Both methods construct the diagonalizer as a product of planar         transformations: Givens rotations and triangular matrices in         QRJ1D and QRJ2D, or only triangular matrices in LUJ1D and LUJ2D         methods, and Givens rotations and hyperbolic rotations in the         case of J-Di. In all cases, the determinant of these         transformations is one and their product spans the special         linear group.     -   LUJ1D, LUJ2D, QRJ1D and QRJ2D minimize global cost functions,         J-Di minimizes a local cost function for each current pair of         indices (i; j),     -   For each pair of indices the two angles are simultaneously         optimized by J-Di whereas the two optimizations are independent         in LUJ1D, LUJ2D, QRJ1D and QRJ2D.     -   In view of (necessarily) limited simulations, LUJ1D, LUJ2D,         QRJ1D and QRJ2D seem to converge linearly and J-Di         quadratically.     -   The computational costs per sweep are identical (if grouping the         two (i; j)-updates).

LUJ1D is selected for the comparison in the next section because it seems faster in the specific simulated contexts.

VI. PERFORMANCE EVALUATION ON NUMERICAL SIMULATIONS

The method according the present invention (i.e. J-Di) is compared to FFDiag [4] and LUJ1D [7] because these two last NOJD algorithms do not require the positive definiteness of any matrix of R.

One iteration (called sweep below) of the FFDiag algorithm has approximatively the same complexity (2KN³ for updating R) than one J-Di or LUJ1D sweep. Therefore, the only thing which will be compared is the number of sweeps necessary for convergence.

Nevertheless, the operations are structured differently within one sweep: J-Di and LUJ1D require N(N−1)/2 very sparse updates of R whereas FFDiag requires only a complete dense update.

This difference probably makes a FFDiag sweep slightly faster than a J-Di or LUJ1D sweep when implemented in an interpreted language like Matlab. But an implementation in C for instance should show similar execution times.

Two different criteria of performance are defined and measured first on a set of exactly diagonalizable matrices and second on a set of approximatively diagonalizable matrices. The simulations are run in Matlab double precision.

VI.A. Criteria of Performance for Non-Orthogonal Joint Diagonalization

Two ways of measuring the quality of a joint diagonalization are used below. The first criterion consists in computing the ratio of the off-diagonal norm, defined as the sum of the squared off-diagonal entries of R, and the diagonal norm, defined as the sum of the squared diagonal entries of R. This non-standard normalization is useful since the diagonal norm is not constant during the first sweeps.

A second criterion is necessary to assess the quality of the estimation of the mixing matrix and the corresponding separation performance. As a matter of fact, the inverse of the estimated mixing matrix Â is the optimal spatial filter (in the absence of additive noise) for separating the sources mixed by A.

The closer the product Â⁻¹A is to a permutation matrix

times a diagonal matrix D, the better is the separation. Therefore, it is natural to measure the quality of separation by computing the distance between P

Â⁻¹A and the closest product

D of a permutation and a diagonal matrix.

In BSS, it is standard to use the index of performance I(P) defined by

$\begin{matrix} {{I(P)}\overset{\Delta}{=}{{\frac{1}{2{N\left( {N - 1} \right)}}{\sum\limits_{n = 1}^{n = N}\left( {{\sum\limits_{m = 1}^{m = N}\frac{P_{nm}}{\max_{k}{P_{nk}}}} - 1} \right)}} + {\frac{1}{2{N\left( {N - 1} \right)}}{\sum\limits_{m = 1}^{m = N}\left( {{\sum\limits_{n = 1}^{n = N}\frac{P_{nm}}{\max_{k}{P_{km}}}} - 1} \right)}}}} & (30) \end{matrix}$

even if it is not a perfect distance to

D. The normalization term 1/2N(N−1) has been chosen so that I(P) is equal to α for 0≦α<1 when P is a matrix with diagonal entries equal to ones and off-diagonal entries equal to α. Hence I(P) gives the order of magnitude of the relative amplitude of interferences in the separated sources (cross-talk index).

VI.B. Exactly Diagonalizable Matrix Set

The mixing matrix entries are independent normally distributed, its determinant is set to 1 and its columns are normalized to the same norm using the equation (3). Those random mixing matrices generally have a condition number smaller than 10⁴ or 10⁵, rarely more as shown in the following figures. A more specific validation of the J-Di algorithm should be done for applications involving worse-conditioned mixing matrices. The diagonal entries of the D_(k) matrices are also chosen from a standard normal distribution. Note that this way of generating the D_(k) determines the distribution of the modulus of uniqueness which only depends on K and N. The R_(k) are computed by R_(k)=AD_(k)A^(T).

1) Quadratic Convergence:

-   -   The diagonality ratios of the J-Di, FFDiag and LUJ1D algorithms         are computed after each sweep until convergence and are plotted         in FIG. 2 for N=K=50 and 50 independent trials. In this case,         the modulus of uniqueness is small almost always between 0.4 and         0.6.     -   The separation criterion is plotted in FIG. 3 under the same         conditions.     -   These results indicate that FFDiag and J-Di algorithms seem to         have (at least) quadratic convergence but that J-Di requires         less sweeps. In spite of a final linear convergence LUJ1D shows         intermediate performance.

2) Influence of the Matrix and Set Sizes on the J-Di Convergence Speed:

-   -   For different mixing matrices and different set sizes (N=K=5,         10, 20, 50, 100 and 200), the number of sweeps necessary to         reach convergence of J-DI DI is counted and its distribution for         100 independent trials is given in the following table:

≦4 5 6 7 8 9 10 ≧11 N = K = 5 1 86 13 0 0 0 0 0 N = K = 10 0 5 81 14 0 0 0 0 N = K = 20 0 0 27 65 7 1 0 0 N = K = 50 0 0 0 48 47 5 0 0 N = K = 100 0 0 0 11 74 12 3 0 N = K = 200 0 0 0 2 43 41 11 3

-   -   It can be seen that the J-Di number of sweeps increases with the         matrix size but remains roughly smaller than 10 even for large         sets (K=200) of large matrices (N=200).

VI.C. Non-Exactly Diagonalizable Sets of Matrices: Application to Blind Source Separation

The performance of the J-Di, FFDiag and LUJ1D non-orthogonal joint diagonalization algorithms is compared on non-exactly diagonalizable sets of matrices.

To evaluate the method according the present invention (J-Di) on a set of matrices that are neither positive (like covariance matrices) nor negative, a set of fourth-order cumulant slices is chosen.

Consider the classical BBS model: x=As +n where x, s and n are respectively the N-dimensional vectors of observed mixtures, sources and additive Gaussian noise, and A is the N×N mixing matrix. The considered matrix set R contains the K=N(N+1)/2 matrices R_(mn) of size N×N defined by

R_(mn)

Cum(:,:,x_(m),x_(n))  (31)

for all m and n such that 1≦m≦n≦N and where x_(n) denotes the n-th entry of x.

It is considered N=20 sources and therefore K=N(N−1)/2=210 cumulant slices (matrices) of size 20×20. The samples of source signals are independent and uniformly distributed between −√{square root over (3)} and √{square root over (3)} to get non-Gaussian, zero-mean and unity-variance signals. The standard deviation of the additive zero-mean Gaussian noise σ is first set to zero and second set to σ=0.1. The cumulants are estimated on 100000 independent samples. In these cases, the number of matrices K is large enough such that the modulus of uniqueness is small, generally between 0.3 and 0.6. It explains why this parameter does not seem to influence the performance in the simulations and is therefore not presented.

Speed of Convergence:

The convergence of the diagonality ratio and the separation criterion are plotted as functions of the number of sweeps with noise in FIG. 4.

In all cases the method according to the present invention (J-Di) converges faster than FFDiag and LUJ1D which seem to behave similarly like in [7]. J-Di outperforms slightly FFDiag and LUJ1D in terms of the final separation criterion, which is what matters for BSS applications.

These simulations also show that different convergence rates, quadratic for FFDiag and linear for LUJ1D, can be compatible with similar performance when the matrix set is not exactly diagonalizable.

VII. COMPARISON WITH THE ALGORITHM OF WANG, LIU AND ZHANG

A comparison with the algorithm of Wang, Liu and Zhang [6] (hereafter DNJD) will be discussed herebelow. The proposed algorithm DNJD builds the mixing matrix by successive multiplications of matrices J(θ, Ψ, i, j) defined by

${J\left( {\theta,\psi,i,j} \right)}\overset{\Delta}{=}\begin{pmatrix} I_{i - 1} & \vdots & 0 & \vdots & 0 \\ \cdots & {{\rho cos}\; \theta} & \cdots & {{\rho sin}\; \theta} & \cdots \\ 0 & \vdots & I_{j - i - 1} & \vdots & 0 \\ \cdots & {{\rho cos}\; \psi} & \cdots & {{\rho sin}\; \psi} & \cdots \\ 0 & \vdots & 0 & \vdots & I_{N - j} \end{pmatrix}$

With:

$\rho = \left( {1 + \frac{\sin \; 2{\theta sin2\psi}}{2}} \right)^{{- 1}/4}$

This definition is clearly an attempt to generalize the Givens rotations and the Jacobi-like algorithm of JADE. It is clear that the DNJD algorithm is very different from the present invention. For instance, DNJD does not use nor the determinant constraint neither the hyperbolic trigonometry that are used in the method according to the invention.

The main difficulty with DNJD is the inversion of Eq. (16). No exact solution is provided when the R_(k) matrices are not exactly diagonalizable making necessary to use approximate expressions that do not guarantee the actual minimization of the off-diagonal entries. Therefore it is necessary to compute at each iteration (for each pair of indices) the evolution of the off-diagonal entries and to discard the update if the off-diagonal entries are increased. Unfortunately, the computational cost of this verification, 4KN, is equivalent to the cost of the algorithm itself (see page 5303 at the end of section III). Hence the cost of each sweep is multiplied by a factor of two. In addition, each time a couple of angles (θ, φ) is discarded, many computations (8KN flops) are made without any improvement of the cost function. This waste could explain the larger number of sweeps used by DNJD (FIG. 1 p. 5303 in [6]) than by J-Di (table I).

In summary, J-Di over-performs DNJD in terms of complexity per sweep by a factor larger than two.

VIII. CONCLUSION

A new method (J-Di) of non-orthogonal joint diagonalization has been presented. In one embodiment, this method comprises:

constructing a mixing matrix that belongs to the special linear group and whose columns are of equal norm, instead of constructing it in the special orthogonal group,

multiplying Givens rotations, hyperbolic rotations and diagonal matrices instead of only Givens rotations.

This new method admits many variants: the order of the elementary matrix multiplications, the stop test, the pre-whitening or not, etc. Its robustness and reduced complexity allow to deal with large dimension problems: up to N=500 and K=600 on a standard computer.

REFERENCES

-   [1] Comparaison de methodes de separations de sources, Souloumiac     A., Cardoso J.-F., Proceedings of GRETSI 1991, Juan les Pins,     Septembre 1991. -   [1b] Blind beamforming for non Gaussian signals, Jean-Frangois     Cardoso and Antoine Souloumiac, In IEE Proceedings-F,     140(6):362-370, December 1993. -   [2] A blind source separation technique using second-order     statistics, Belouchrani A., AbedMeraim K., Cardoso J.-F., Moulines     E., IEEE TRANSACTIONS ON SIGNAL PROCESSING 45 (2): 434-444, FEB     1997. -   [3] Joint Approximate Diagonalization of Positive Definite Hermitian     Matrices, Pham D. T., SIAM J. on Matrix Anal. and Appl., 22(4):     1136-1152, 2001. -   [4] A Fast Algorithm for Joint Diagonalization with Non-orthogonal     Transformations and its Application to Blind Source Separation,     Ziehe A., Laskov P., Nolte G., Müller K.-R., Journal of Machine     Learning Research 5 (2004) 777-800. -   [5] Quadratic Optimization for Simultaneous Matrix Diagonalization,     Voligraf R., Obermayer K., IEEE Trans. on Signal Processing, vol.     54, no. 9, pp. 3270-3278, September 2006. -   [6] Nonorthogonal Joint Diagonalization Algorithm Based on     Trigonometric Parametrization, Wang F., Liu Z., Zhang J., IEEE     Trans. on Signal Processing, vol. 55, no. 11, pp. 5299-5308,     November 2007. -   [7] Simple LU and QR Based Non-Orthogonal Matrix Joint     Diagonalization, B. AFSARI, Independent Component Analysis and Blind     Source Separation, Lecture Notes in Computer Science, vol. 3889, pp.     1-7, 2006. -   [8] Matrix Computations, G. H. GOLUB, C. F. VAN LOAN, Baltimore,     Johns Hopkins University Press, 3rd edition, 1996. -   [9] What Can Make Joint Diagonalization Difficult?, B. AFSARI,     ICASSP 2007, Vol. 3, pp. III-1377-1380, 15-20 Apr. 2007. -   [10] Sensitivity Analysis for the problem of Non-Orthogonal Matrix     Joint Diagonalization, B. AFSARI, Submitted to SIAM J. on Matrix     Computations. -   [11] Non-Orthogonal Joint Diagonalization in the Least-Squares Sense     with Application in Blind Source Separation, A. YEREDOR, IEEE Trans.     on Signal Processing, vol. 50, no. 7, pp. 1545-1553, July 2002. -   [12] Sur la diagonalisation conjointe approchée par un critere des     moindres carrés, S. DEGERINE, in Proceedings XVIII Colloque Gretsi     2001, n2, pp. 311-314, Toulouse, France, September 2001. 

1. A method for separating mixed signals (x₁(t), . . . , x_(N)(t)) recorded by at least two sensors into a plurality of component signals (s₁(t), . . . , s_(M)(t)), each mixed signal x_(n)(t) corresponding to a linear combination of the component signals (s₁(t), . . . , s_(M)(t)) stemming from independent sources, each mixed signal being able to be written as the product of a mixture matrix (A) multiplied by a vector of the component signals stemming from the sources (s₁(t), . . . , s_(M)(t)) wherein the method further comprises a joint diagonalization step (300) wherein the mixture matrix is estimated from a product of rotation matrices corresponding to one or more Givens rotation matrices (G(θ)) and to one or more hyperbolic rotation matrices (H(φ)).
 2. The method according to claim 1, wherein the Givens rotation matrices include elements corresponding to trigonometric functions of θ and hyperbolic rotation matrices include elements corresponding to hyperbolic functions of φ.
 3. The method according to claim 2, wherein the diagonalization step comprises the multiplication of a family of estimated matrices ({circumflex over (R)}_(k)) by the product of a Givens rotation matrix and of a hyperbolic rotation matrix on the one hand and the transposed matrix of this product on the other hand, and the estimation of the θ_(n) and φ_(n) values with which the non-diagonal terms of the resulting matrices of order n of this multiplication may be minimized.
 4. The method according to claim 3, wherein the multiplication step is repeated on resulting matrices of order n.
 5. The method according to claim 4, wherein the multiplication step is repeated until the Givens angle θ_(n) and the hyperbolic angle φ_(n) are such that cos θ_(n) and cos hφ_(n) respectively tend to one or the corresponding sines to zero.
 6. The method according to claim 1, wherein the estimated matrices are equal to the product of the mixture matrix by any diagonal matrix by the transposed mixture matrix, these for example are matrices of cumulants, or covariance matrices in different frequency bands, or intercorrelation matrices with different time shifts, or covariance matrices estimated over different time intervals.
 7. The method according to claim 1, wherein the Givens rotation matrices comprise case terms on the diagonal, a sin θ term below the diagonal and a −sin θ term above the diagonal, and the hyperbolic rotation matrices comprise cos hφ elements on the diagonal, a sin hφ element below the diagonal and a sin hp element above the diagonal.
 8. The method according to claim 7, wherein the Givens rotation matrices are of the $\quad\begin{pmatrix} {\cos \; \theta} & {{- \sin}\; \theta} \\ {\sin \; \theta} & {\cos \; \theta} \end{pmatrix}$ type and the hyperbolic rotation matrices are of the $\quad\begin{pmatrix} {\cosh \; \phi} & {\sinh \; \phi} \\ {\sinh \; \phi} & {\cosh \; \phi} \end{pmatrix}$ type.
 9. A device for separating mixed signals (x₁(t), . . . , x_(N)(t)) recorded by at least two sensors into a plurality of component signals (s₁(t), . . . , s_(M)(t)), each mixed signal x_(n)(t) corresponding to a linear combination of component signals (s₁(t), . . . , s_(M)(t)) stemming from independent sources, each mixed signal being able to be written as the product of a mixture matrix (A) multiplied by a vector of the component signals stemming from these sources, wherein the device further comprises means for joint diagonalization in which the mixture matrix A is estimated from a product of rotation matrices corresponding to one or more Givens rotation matrices and to one or more hyperbolic rotation matrices.
 10. A computer program product comprising program code instructions recorded on a medium which may be used in a computer, wherein it comprises instructions for applying the method according to any of claims 1 to
 8. 